9 research outputs found
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
Consistent Multimodal Generation via A Unified GAN Framework
We investigate how to generate multimodal image outputs, such as RGB, depth,
and surface normals, with a single generative model. The challenge is to
produce outputs that are realistic, and also consistent with each other. Our
solution builds on the StyleGAN3 architecture, with a shared backbone and
modality-specific branches in the last layers of the synthesis network, and we
propose per-modality fidelity discriminators and a cross-modality consistency
discriminator. In experiments on the Stanford2D3D dataset, we demonstrate
realistic and consistent generation of RGB, depth, and normal images. We also
show a training recipe to easily extend our pretrained model on a new domain,
even with a few pairwise data. We further evaluate the use of synthetically
generated RGB and depth pairs for training or fine-tuning depth estimators.
Code will be available at https://github.com/jessemelpolio/MultimodalGAN.Comment: In revie
Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation
Social navigation is the capability of an autonomous agent, such as a robot,
to navigate in a 'socially compliant' manner in the presence of other
intelligent agents such as humans. With the emergence of autonomously
navigating mobile robots in human populated environments (e.g., domestic
service robots in homes and restaurants and food delivery robots on public
sidewalks), incorporating socially compliant navigation behaviors on these
robots becomes critical to ensuring safe and comfortable human robot
coexistence. To address this challenge, imitation learning is a promising
framework, since it is easier for humans to demonstrate the task of social
navigation rather than to formulate reward functions that accurately capture
the complex multi objective setting of social navigation. The use of imitation
learning and inverse reinforcement learning to social navigation for mobile
robots, however, is currently hindered by a lack of large scale datasets that
capture socially compliant robot navigation demonstrations in the wild. To fill
this gap, we introduce Socially CompliAnt Navigation Dataset (SCAND) a large
scale, first person view dataset of socially compliant navigation
demonstrations. Our dataset contains 8.7 hours, 138 trajectories, 25 miles of
socially compliant, human teleoperated driving demonstrations that comprises
multi modal data streams including 3D lidar, joystick commands, odometry,
visual and inertial information, collected on two morphologically different
mobile robots a Boston Dynamics Spot and a Clearpath Jackal by four different
human demonstrators in both indoor and outdoor environments. We additionally
perform preliminary analysis and validation through real world robot
experiments and show that navigation policies learned by imitation learning on
SCAND generate socially compliant behavior
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms
A major challenge to deploying robots widely is navigation in human-populated
environments, commonly referred to as social robot navigation. While the field
of social navigation has advanced tremendously in recent years, the fair
evaluation of algorithms that tackle social navigation remains hard because it
involves not just robotic agents moving in static environments but also dynamic
human agents and their perceptions of the appropriateness of robot behavior. In
contrast, clear, repeatable, and accessible benchmarks have accelerated
progress in fields like computer vision, natural language processing and
traditional robot navigation by enabling researchers to fairly compare
algorithms, revealing limitations of existing solutions and illuminating
promising new directions. We believe the same approach can benefit social
navigation. In this paper, we pave the road towards common, widely accessible,
and repeatable benchmarking criteria to evaluate social robot navigation. Our
contributions include (a) a definition of a socially navigating robot as one
that respects the principles of safety, comfort, legibility, politeness, social
competency, agent understanding, proactivity, and responsiveness to context,
(b) guidelines for the use of metrics, development of scenarios, benchmarks,
datasets, and simulators to evaluate social navigation, and (c) a design of a
social navigation metrics framework to make it easier to compare results from
different simulators, robots and datasets.Comment: 43 pages, 11 figures, 6 table
Non-Realistic 3D Object Stylization
In this paper we introduce the novel paradigm of non-realistic 3D stylization, where the expressiveness of a given 3D model is manifested in the 3D shape itself, rather than only in its rendering. We analyze the input model using abstraction, simplification, and symmetrization operators to determine important features that are later represented by new geometry. Doing so, we create a stylized and expressive representation of the input that can be rendered or might be printed using a 3D printer. We conducted a user study to verify the proposed stylizations and demonstrate the approach by using standard geometry of buildings as well as natural and technical objects